Skip to content

Conversation

@belloibrahv
Copy link
Contributor

The implementation follows the approach of other LLM providers and uses the BEDROCK_API_KEY and BEDROCK_REGION environment variables for authentication.

This resolves issue #1162.

@belloibrahv
Copy link
Contributor Author

@georgeh0 @badmonster0 please help me review this PR

@badmonster0
Copy link
Member

thanks @belloibrahv could you share how you tested - using which example, and what are the results look like?

@belloibrahv
Copy link
Contributor Author

Hi @badmonster0,

Thanks for the feedback. I've updated the pull request to include a new example that demonstrates how to test the AWS Bedrock integration.

Testing Example and Workflow

To answer your question about testing, I've added a new example located at examples/bedrock_llm_extraction. This example is now part of the PR and serves as a live demonstration of the feature.

Workflow

The example demonstrates a real-world use case:

  1. It reads PDF files from a local directory.
  2. It converts them to Markdown.
  3. It uses the new Bedrock LLM integration to extract structured data (ModuleInfo) from the Markdown content.

How to Run the Test

You can run the test by following the instructions in the new examples/bedrock_llm_extraction/README.md file. Here's a summary of the steps:

  1. Set up the environment: Copy the examples/bedrock_llm_extraction/.env.example file to .env and fill in your AWS Bedrock credentials.

    cp examples/bedrock_llm_extraction/.env.example examples/bedrock_llm_extraction/.env
  2. Run the pipeline: From the root of the project, execute:

    pip install -e ./examples/bedrock_llm_extraction
    cocoindex setup examples/bedrock_llm_extraction/main.py
    cocoindex update examples/bedrock_llm_extraction/main.py

Expected Results

After a successful run, a modules_info table will be populated in your Postgres database. You can verify the extracted data with the following SQL query:

SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;

The output should look similar to this:

      filename       |         title          |      module_summary
---------------------+------------------------+--------------------------
 manuals/asyncio.pdf | "asyncio — Asynchronous" | {"num_classes": 0, "num_methods": 0}
 manuals/json.pdf    | "json — JSON encoder"  | {"num_classes": 0, "num_methods": 0}

Current Status

I've fully prepared this example and confirmed that it's ready for verification. However, I'm currently blocked from running the final test myself due to an issue with my AWS account verification, which is preventing me from getting a BEDROCK_API_KEY.

Since the example is now included in the PR, I hope it makes it easy for you to test the changes on your end.

Thanks again for your guidance and support.

This commit adds support for AWS Bedrock for LLM parsing.

The implementation follows the approach of other LLM providers and uses the `BEDROCK_API_KEY` and `BEDROCK_REGION` environment variables for authentication.

This resolves issue cocoindex-io#1162.
@belloibrahv belloibrahv force-pushed the feature/add-aws-bedrock-llm-support branch from ddc60ae to 736b580 Compare October 9, 2025 15:29
Copy link
Member

@georgeh0 georgeh0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also help update the doc to add this new LLM integration? Thanks!

Comment on lines 1067 to 1085
def test_llm_api_type_bedrock() -> None:
"""Test that LlmApiType.BEDROCK is available and works."""
from cocoindex.llm import LlmApiType, LlmSpec

# Test enum availability
assert hasattr(LlmApiType, "BEDROCK")
assert LlmApiType.BEDROCK.value == "Bedrock"

# Test LlmSpec creation with Bedrock
spec = LlmSpec(
api_type=LlmApiType.BEDROCK, model="us.anthropic.claude-3-5-haiku-20241022-v1:0"
)

assert spec.api_type == LlmApiType.BEDROCK
assert spec.model == "us.anthropic.claude-3-5-haiku-20241022-v1:0"
assert spec.address is None
assert spec.api_config is None


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is unnecessary. It doesn't test any logic really needs to be tested. Please remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to fork the example for just another LLM API integration. We only need to add a few lines in the existing example like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the changes as requested, kindly help me to review when you have time, Thank You

- Update documentation to include Bedrock LLM integration.
- Remove unnecessary test for LlmApiType.BEDROCK.
- Add Bedrock to the existing manuals_llm_extraction example instead of creating a new one.
@belloibrahv belloibrahv requested a review from georgeh0 October 11, 2025 10:32
@georgeh0 georgeh0 merged commit fa268d9 into cocoindex-io:main Oct 13, 2025
9 checks passed
@badmonster0
Copy link
Member

hi @belloibrahv https://cocoindex.io/blogs/cocoindex-changelog-2025-10-19#belloibrahv latest release note out and we have a section for you, thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants